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Abstract The need to blend observational data and mathematical models arises in 
many applications and leads naturally to inverse problems. Parameters appearing in 
the model, such as constitutive tensors, initial conditions, boundary conditions, and 
forcing can be estimated on the basis of observed data. The resulting inverse prob- 
lems are often ill-posed and some form of regularization is required. These notes dis- 
cuss parameter estimation in situations where the unknown parameters vary across 
multiple scales. We illustrate the main ideas using a simple model for groundwater 
flow. 

We will highlight various approaches to regularization for inverse problems, in- 
cluding Tikhonov and Bayesian methods. We illustrate three ideas that arise when 
considering inverse problems in the multiscale context. The first idea is that the 
choice of space or set in which to seek the solution to the inverse problem is inti- 
mately related to whether a homogenized or full multiscale solution is required. This 
is a choice of regularization. The second idea is that, if a homogenized solution to 
the inverse problem is what is desired, then this can be recovered from carefully de- 
signed observations of the full multiscale system. The third idea is that the theory of 
homogenization can be used to improve the estimation of homogenized coefficients 
from multiscale data. 
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1 Introduction 

The objective of this overview is to demonstrate the important role of multiscale 
modelling in the solution of inverse problems for differential equations. The main 
inverse problem we discuss is that of determining unknown parameters by match- 
ing observed data to a differential equation model involving those parameters. The 
unknown parameters may be functions, in general, and they may have variation 
over multiple (length) scales. This multiscale structure makes the forward problem 
more challenging: numerically computing the solution to the differential equation 
requires very high resolution. The multiscale structure also complicates the inverse 
problem. Should we try to fit the data with a high-dimensional parameter, or should 
we seek a low-dimensional "homogenized" approximation of the parameter? If a 
low-dimensional parameter model is used, how should we account for the mismatch 
between the true parameters and the low-dimensional representation? After obtain- 
ing a solution to the inverse problem, one typically wants to make further predictions 
using whatever parameter is fit to the observed data, so it is important to consider 
whether a low-dimensional representation of the unknown parameter is sufficient to 
make additional predictions. 

Throughout these notes the unknown parameters will be denoted by u 6 X; typi- 
cally u is a function assumed to lie in a Banach space X. We use y <G Y to denote the 
data (for simplicity we often take Y = Mr) and z to denote the predicted quantity, as- 
sumed to be an element of a Banach space Z or, in some cases, a Z— valued random 
variable. The map §f : X — > M. N denotes the forward mapping from the unknown pa- 
rameter to the data, and ^ :X — > Z {or : X x Q ^ Z in the random case) denotes 
the forward mapping from the parameter to the prediction. We sometimes refer to 
§f as the observation operator and .'P as the prediction operator. Both Sf and are 
typically derived from a common solution operator G : X — > P mapping u G X to 
the solution G(u) G P of a partial differential equation (PDE), where P is a Banach 
space. For example Sf may be derived by composing G with N linear functionals. 

The ideal inverse problem is to determine u G X from knowledge of y € M. N 
where it is assumed that y = ^(u). In practice, however, the data y is generated 
from outside this clean mathematical model, so it is natural to think of the data y as 
being given by 

y = &(u) + $ (1) 

for some t, G M. N quantifying model erroiQ and observational noise. The value of B, 
is not known, but it is common in applications to assume that some of its statistical 
properties are known and these can then be built into the methods used to estimate 
u. Once the function u is determined by solving this inverse problem, it can be used 
to make a prediction z = ,^(u). 

We illustrate three ideas that arise when attempting to solve the inverse problem 
defined by (Q3 in the multiscale context: 



1 Model error can be incorporated within the set of unknown parameters u and estimated using 
data; however this idea is not pursued here. 
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• (a) The choice of the space or set in which to seek the solution to the inverse prob- 
lem is intimately related to whether a low-dimensional "homogenized" solution 
or a high-dimensional "multiscale" solution is required for predictive capability. 
This is a choice of regularization. 

• (b) If a homogenized solution to the inverse problem is desired, then this can be 
recovered from carefully designed observations of the full multiscale system. 

• (c) The theory of homogenization can be used to improve the estimation of ho- 
mogenized parameters from observations of multiscale data. 

In Section 12 we consider in detail a worked example which exemplifies the use 
of multiscale methods to approximate the forward problems Sf and & for data and 
predictions; this example will be used to illustrate many of the general ideas devel- 
oped in these notes, and the three ideas (a)-(c) in particular. Section[3]is devoted to a 
brief overview of regularization techniques for inverse problems, and to discussion 
of the idea (a). Section|4]is devoted to the idea (b). We study the problem of estimat- 
ing a single scalar parameter in a homogenized model of groundwater flow, given 
data which is generated by a full multiscale model. This may be seen as a surrogate 
for understanding the use of real-world data (which is typically multiscale in char- 
acter) to estimate parameters in simpler homogenized models. Section[5]is devoted 
to the idea (c). We study the use of ideas from multiscale methodology to enhance 
parameter estimation techniques for homogenized models. The viewpoint taken is 
that the statistics of the error £ appearing in (Q~|i can be understood using the theory 
of homogenization for random media; when these statistical properties depend on 
the unknown parameter u the noise £, is no longer additive and its dependence on u 
plays an important role in the parameter estimation process. 



1.1 Notation 

The following notation will be used throughout. We use | • | to denote the Euclidean 
norm on R m (for possibly different choices of m). We let S d (resp. S ' + ) denote the 
set of symmetric (resp. positive-definite) second order tensors on Mr. If F £ S d ' + , we 
define the weighted norm | • \p = |.T _ 3 • | on R m . Throughout the notes, X is a Banach 
space, containing the functions that we wish to estimate, and E a Banach space 
compactly embedded into X. When studying the inverse problem from a Bayesian 
perspective we will use Gaussian priors on X, defined via a covariance operator 
on a Hilbert space H D X, with norm || • ||#. In this situation E will be the Hilbert 
space with norm W^^ 1 ■ \\h- 
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1.2 Running Example 



We consider a model for groundwater flow in a medium with permeability tensor k, 
pressure p and Darcy velocity v (or the volume flux of water per unit area) related 
to the pressure via the Darcy law: 

v = ~(Vp-pgiz) (2) 
M 

where fi is the fluid viscosity, p is the fluid density, g is the acceleration due to 
gravity and e z is the unit vector in the z-direction. We choose units in which jj. = 1 . 
We also assume that we have a constant density fluid and redefine the pressure by 
adding pgz (z is the vertical direction) to write (f2]i in the form v = —kVp. Assum- 
ing that the Darcy velocity is divergence-free, except at certain known source/sink 
locations, we obtain the following elliptic equation for the pressure: 

V-v = /, xeD, 

p = 0, x € 3D, (3) 
v = -kVp 

where D C M. d is an open and bounded set with regular boundary, and / is assumed 
to be known. The permeability tensor field k, however, is assumed to be unknown 
and must be determined from data. In order to make the elliptic PDE (© for the 
pressure p well-posed, we assume that the permeability tensor k(x) is an element of 
S d ' + and so we write it as the (tensor) exponential: k(x) = exp(u(x)), u £ S d . It is 
natural to view u as an element of X : = LT (D; S ) and to consider weak solutions of 
© with/ EH-^D). Then we have a unique solution p £ Hq (D) satisfying 

IIVpH^dexpaMWII/lltf-i, (4) 

for some c\ > depending only on d and D, and ||m||x being the essential supremum 
of the spectral radius of the matrix u(x), as x varies over D: 




Thus we may define G : X — > Hq(D) by G(u) = p. Now consider a set of real- 
valued continuous linear functionals £j : H l (D) — s- R and define 5f : X — > by 
&{u)j — tj(G(uj). The inverse problem is to determine u £ X from y £ M. N where 
it is assumed that y is given by ([TJ. Using (|4|i one may show that G : X — >• Hq (D) 
(resp. : X —> Mr 1 ) is Lipschitz. Indeed if p, denotes the solution to (0 with log 
permeability m, then, we have 

||Vpi-Vp2|| i2 < (c 1 ) 2 ||M 1 -M2||zexp(2(||M 1 ||x + ||M 2 |U)) ll/Hff-i. (5) 
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Study of the transport of contaminants in groundwater flow is a natural example 
of a useful prediction that can be made once the inverse problem is solved. To model 
this scenario we consider a particle x(t) £ M. d which is advected by the the ground- 
water velocity field v/0, where is the porosity of the rock and v is the Darcy ve- 
locity field from (0), and subject to diffusion with coefficient 2rj . Assuming that the 
contaminant is initially at jcimt we obtain the stochastic differential equation (SDE): 



where W(t) is a standard Brownian motion on W 1 . If we are interested in predicting 
the location of the contaminant at time T then our prediction will be the function 
J^t) given by (u) = x(T). Here for each fixed 77 <G [Q,°°) the function maps 
X into the family of W 1 — valued random variables. 



2 The Forward Problem: Multiscale Properties 

Some inverse problems arising in applications have the property that the forward 
model 5f mapping the unknown to the data will produce similar output on both 
highly oscillatory functions u and on appropriately chosen smoothly varying func- 
tions u. Furthermore, for some choices of prediction function J£" the predictions 
themselves will also be close for both highly oscillatory functions u and on appro- 
priately chosen smoothly varying functions u. These properties can be seen from an 
application of multiscale analysis, and we illustrate them by considering the prob- 
lem introduced in Section 11.21 There are many texts on the theory of multiscale 
analysis. For example, the basic homogenization theorems discussed here are devel- 
oped in J6). A recent overview of the subject, with many other references and using 
the same notational conventions that we adopt here, is 11241 . 

We consider a multiscale version of the running example from Section [L2l where 
the permeability tensor is k = K £ (x) = K(x,x/e) where K : D x T d — >• S d,+ is peri- 
odic in the second argument, e > a small parameter. For now we have assumed 
periodic dependence on the fast scale in K £ ; however we will generalize this to 
random dependence in later developments. 

With this permeability we obtain the family of problems 



If we set 77 = erjo, then the transport of contaminants is given by the SDE 




(6) 




/, xeD, 
0, x e 3D, 
-K £ Vp £ . 



(7a) 
(7b) 
(7c) 



dx e = 




dt+ \Z2rjoedW, x £ (0) 



(8) 
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Standard techniques from the theory of homogenization for elliptic PDEs can be 
used to show that for e small, 

p £ {x) a pl{x) := p (x)+e Pl {x, ^) (9) 

where po and p\ are defined as follows. First we define the effective (homogenized) 
permeability tensor Kq via solution of the cell problem for x(x,y): 

-Vy{V y XK T )=VyK T , yeT 1 . (10) 

Then 

Kq(x)= f Q(x,y)dy, (11) 

Q(x,y)=K(x,y)+K(x,y)V yX (x,y) T . (12) 

In this sense we observe that the effective diffusivity Kq(x) is the average of Q(x,y) 
over the fast scale y. This is not equal to the average of K (x,y) over y, except in 
trivial cases. We denote by uq the logarithm of Kq so that Kq = exp(«rj). 
The function po solves the (e independent) elliptic PDE 

V-v =/, xeD, (13a) 

Po = g, x e 3D, (13b) 

vo = -K Q Vp . (13c) 

and the corrector p\ is given by 

pi(x,y)=x(x,y)-Vpo(x). (14) 

Note that (TTOb may be written as 

-V r (e r )=0, yeT 1 . (15) 

This shows that Q, which is averaged to give the effective permeability tensor, is 
divergence-free with respect to the fast variable y. 

It is possible to prove that, in the limit as e — > 0, solutions to (O converge to solu- 
tions to ( fT3l l, the convergence being strong in Lr{D) and weak in H l (D) lfT0l [Tl l24ll . 
However if we want to prove strong convergence in H l then we need to include 
information about the corrector term p\. The following theorem and corollary sum- 
marize these ideas. For proofs see [fl~), or the discussion in the texts |[T0ll24l . 

Theorem 1. Let p £ and po be the solutions of (O and (1131 l. Assume that f G C°°{D) 
andthatK{x,y) G C°°(D;Cp el .(T c/ )). Then 

1^11^-^11^1=0. (16) 

£->0 
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Corollary 1. Under the same conditions as in Theorem\l\we have 

!Ip £ -Po|| L 2^0 and ||V / , £ -(/ + Zy (.,./e) r )V / , || L2 ^0 

as £ -> 0. 

In fact it is frequently the case that the convergence in Theorem Q] may be ob- 
tained in a stronger topology. Reflecting this we make the following assumption. 



Assumption 2 The function p £ converges to po in U°(D) and its gradient converges 
to the gradient of po + £p\ in L°°(D) so that 

]im\\p e -p e a \\ wh „ = 0. 

In Appendix |5.3| we prove this assumption for the one dimensional version of (0. 
The proof in the multidimensional case will be presented elsewhere fl22l . The proof 
of this assumption in the multidimensional case is based on the estimates proved 
in 12 (in particular, Lemma 16), see also lfT31 Lemma 2.1]. 

With these limiting properties of the elliptic problem (0 at hand it is natural to 
ask what is the limiting behaviour of x e governed by ©■ To answer this question 
we define 

dx v (x ) 

~jf = — I — i *0(°) = *init- (!') 

Notice that this ordinary differential equation (ODE) has vector field vo which is 
defined entirely through knowledge of the homogenized permeability Kq: once Ko 
is known, the elliptic PDE (fl"3T l can be solved for po and then vo is recovered from 
(TT3b). If we can show that solutions of (0 and (flTT i are close then this will establish 
that the prediction of particle transport in the model (0, (0 can be made accurately 
by use of only homogenized information about the permeability. 

In proving such a result there are a number of technical issues which arise caused 
by the presence of the boundary D of the domain in which the PDE (0 is posed. In 
particular solutions of (0 may leave D requiring a definition of the velocity field 
outside D. These issues disappear if we consider the case where D is itself a box 
of length L and is equipped with periodic boundary conditions instead of Dirichlet 
conditions: we may then extend all fields to the whole of W 1 by periodicity. In this 
case, the homogenization theory for (0 with ((Tj?) replaced by periodic boundary 
conditions is identical to that given above, except that (fT3b) is also replaced by 
periodic boundary conditions. We write D = (LT) d and adopt this periodic setting 
for the next theorem, which is proved in Appendix [53] 

Theorem 3. Let x e (t) and xo(t) be the solutions to equations (0 and d 1 7b . with 
velocity fields extended from D = {LT) d to W 1 by periodicity, and assume that As- 
sumption\2\holds. Assume also that f £ C°°(D) and that K(x,y) £ C°°(D;C~ r (T d )). 
Then 

limE sup ||jt- £ (f)-jt- (f)j| =0. 
0<r<r 
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In summary, this example exhibits the property that, if the length scale e is small, 
the data generated from K £ and Kq may appear very similar due to homogenization 
effects. Therefore, when trying to infer parameters from data, it is difficult to distin- 
guish between K e and Kq without some form of regularization or prior assumptions 
about the form of the parameter. On the other hand, Theorem|3] shows that knowing 
only Kq is sufficient to make accurate predictions of the trajectories of dS). 



3 Regularization of Inverse Problems 

In this section we describe various approaches to regularizing inverse problems, 
motivating them by reference to the multiscale example in the previous section. The 
approach to regularizing which is described in Section [3~2l is developed in detail in 
0. The Tikhonov regularization approach from Section [3~3l is developed in detail in 
lfT6l[T7l . Both of these regularization approaches are specific examples of the gen- 
eral set-up often called PDE constrained optimization, which we discuss in Section 
13.41 this subject is overviewed in [18|. An overview of the Bayesian approach to 
inverse problems, a subject that we outline in Section |3~31 is given in |26| and 1 17]. 



3.1 Set-Up 

Our objective here is to determine u, given y, where u and y are related by ([T). We 
assume that, whilst the actual value of £, is not available, it is reasonable to view it 
as a single draw from a statistical distribution whose properties are known to us. To 
be concrete we assume that £, is drawn from a mean zero Gaussian random variable 
with covariance F: we write this as t, ~ N(0,r). We make the following continuity 
assumption concerning the observation operator §f . Note that this (local) Lipschitz 
condition also implies an (exponential in ||«||x) bound on \^(u)\. 

Assumption 4 There are constants c\,C2 > such that, for Uj £ X with \\ui\\x < 
r,i = 1,2, 

-#(u 2 )| < c\ exp(c 2 r)||«i -u 2 \\x- 

In general the inverse problems such as that given by (Q~|) with £, = are hard 
to solve: they may have no solutions, multiple solutions and solutions may exhibit 
sensitive dependence on initial data. For this reason it is natural to seek a least 
squares approach to finding functions u which best explain the data. In view of the 
assumed structure on | a natural least squares functional is 



*(«) = 5|y-sr(«)£. 



(18) 
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The weighting by F in the Euclidean norm induces a normalization on the model- 
data mismatch. This normalization is given by the assumed standard deviations of 
the noise in a coordinate system defined by the eigenbasis for F. 

Example 1. Consider the running example of Section [L2l Equation <(5j shows that 
Assumption |4] holds in this case, noting that &(u)j = £j(p) for some linear func- 
tional £j onH l (D), with the choice X =L°°(D;S d ), provided/ e H l . We use this 
example to illustrate why inverse problems are, in general, hard. 

Assume that the linear functionals tj satisfy the property that ij(p e — Po) 
as e — > 0. This occurs if they are linear functionals on L 2 (D), by Theorem[T]or if 
Assumption |2] holds, if they are linear functionals on C(D). Writing this in terms 
of Sf we have \ ( S{u £S ) — ^(uq)\ — > as e — > 0. (Note that this occurs even though 
u £ and mo are not themselves close.) Hence there is an uncountable family of func- 
tions (indexed by all e sufficiently small) which all return approximately the same 
value of <P(u £ ) and thus simply minimizing <P may be very difficult. Furthermore, 
there may be minimizing sequences which do not converge. For example fix a par- 
ticular realization of the data given by y = @(uq) where mo is the homogenized log 
permeability. Then <P(u £ ) > for all £ > and <P(u £ ) — > as £ — > 0, since 

l<£(" £ )l = \\y-^ £ )\r = \w(uo)-^{u £ )\ 2 r (19) 

On the other hand, u £ does not converge in X as £ — > 0. 
□ 

In order to overcome the difficulties demonstrated in this example regularization 
is needed. In the remaining sections we discuss various regularizations, in general, 
illustrating ideas by returning to the running example. 



3.2 Regularization by Minimization Over a Convex, Compact Set 

Recall that £ is a Banach space compactly embedded into X. Let E a & = {u 6 E : 
\\ u \\e < Ol}. Then £ a( j is a closed convex and bounded set in E and, as such, any 
sequence in £ a( j must contain a weakly convergent subsequence with limit in £ a( j 
(see, for example, Theorem 1.17 in lfl8l ). Now consider the minimization problem 

&= inf <P(u). (20) 

HG£ ad 

Theorem 5. Any minimizing sequence {u"}„£z + f or ASP contains a weakly conver- 
gent subsequence in E with limit u € E^ which attains the infimum: <P(u) = <P. 

Proof. This is a classical theorem from the field of optimization; see lfT8l for details 
and context. Since {u"} is contained in E a d we deduce the existence of a subse- 
quence (which for convenience we relabel as {u n }) with weak limit u £ E^. Thus 
m" — m in E. Hence, by compactness, u" — > u in X. By Assumption|4]we deduce that 
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<P : E — > K is weakly continuous. By definition, for any 8 > there exists N = N(8) 
such that 

®<<P(u n )<® + 8, V«>A. 
By weak continuity of <P : E — > M we deduce that 

< 0(77) <0 + <S. 

The result follows since 5 is arbitrary. □ 

Example 2. Consider the running example of Section [L2l Let A denote a fixed sym- 
metric positive-definite tensor A so that log(A) is defined. We define the subspace 
of tensor valued functions of the form u' = w/ + log(A), for some constant «gK 
noting that then exp(V) = exp(t/)A. By Lipschitz continuity of Sf in u' £ X we de- 
duce (abusing notation) Lipschitz continuity of Sf viewed as a function of u S R. We 
define 

E ad = {u£R:\u\<a}. (21) 

We may take the norm || • \\e = \u\. Thus the problem < f20b attains its infimum for 
some 77 G E^. The regularization of seeking to minimize <P over E a ^ corresponds 
to looking for solution over a one-parameter set of tensor fields, in which the free 
parameter is bounded by a. Note that such a solution set automatically rules out the 
oscillating minimizing sequences which were exhibited in ExampleQ] □ 



3.3 Tikhonov Regularization 

Instead of regularizing by seeking to minimize <P over a bounded and convex subset 
of a compact set E in X, we may instead adopt the Tikhonov approach to regular- 
ization. We consider the minimization problem 

I=Ml(u), (22) 

u€E 

where 

'(") = ~ Ml + *(")• (23) 

Theorem 6. Any minimizing sequence {u n y„ e %+ for ( 1221 ) contains a weakly conver- 
gent subsequence in E with limit 77 which attains the infimum: I(u) = I. 

Proof. This is a classical theorem from the calculus of variations; see |[L2| for details 
and context. Since {u n } is a minimizing sequence and <P > 0, we deduce that for 
any 8 > there exists N = N(8) such that 

|lKllI</+5, V«>A. 
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From this it follows that {u"} ne i+ is bounded in E and hence contains a weak 
limit 77, along a subsequence which, for convenience, we relabel as {«"}. The weak 
continuity of <J> : E — > K, together with weak lower semicontinuity of the function 
|| • ||| — s- R implies the weak lower semicontinuity of / : E R. Hence 

7(77) < liminf/(M„) < 7. 
Since 7(77) > 7, the result follows. □ 

Example 3. Consider the running example of Section [L2l Let E = H s (D;S d ) and 
note that E is compact in X = L c °(D;S d ) for s > d/2. Thus the problem d22l attains 
its infimum for some u E E. As with the example from the previous section the 
regularization rules out highly oscillating minimizing sequences such as those seen 
in Example Q] The choice of the parameter X will effect how much oscillation is 
allowed in any minimizing sequence. □ 



3.4 PDE Constrained Optimization 

The regularizations imposed in the two previous subsections involed the imposition 
of constraints on the input u to a PDE model and the resulting minimizations were 
expressed in terms of u alone. For at least two reasons it is sometimes of interest 
to formulate the minimization problem simultaneously over the input variable u, to- 
gether with the solution of the PDE p = G(u) E P: firstly computational algorithms 
which work to find (p, u) infxl can be more effective than working entirely in 
terms ofuEX; and secondly regularization constraints may be imposed on the vari- 
able p as well as on u. If J : P x X — s- M then this leads to constrained minimization 
problems of the form 

min J(p,u): p — G(u), c(p,u) & (24) 
(p,ti)e/>xx 

where denotes the constraints imposed on both the input u and on the output p 
from the PDE model. Typically the observation operator : X — > M. N is found from 
G and then the information in <P can be built into the definition of /. 

Example 4. Consider the running example from Section 11.21 and assume that the 
observational noise % ~ /V(0,y 2 /). Define 

■ / ^ M ) = ^E i I^WI 2 + yll«ll^ + yll/'llp 

for some s > d/2. Choosing k\ = X and X2 = 0, together with c(p,u) = (p,u) and 
J?T = PxX we obtain from d24b the minimization from Example [3] in the case 
r = y 2 I. Choosing X\ = Xo = 0, c(p,u) = (p,u) and Jff = P x E^ from Example 
|2]we recover that example. Choosing X2 7^ and/or choosing the constraint set J(f 
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to impose constraints on p leads to minimization in which the output p of the PDE 
model is constrained as well as the input u that we are trying to estimate. □ 



3.5 Bayesian Regularization 

The preceding regularization approaches have a nice mathematical structure and 
form a natural approach to the inverse problem when a unique solution is to be 
expected. But in many cases it may be interesting or important to find a large class of 
solutions, and to give relative weights to their importance. This allows, in particular, 
for predictions which quantify uncertainty. The Bayesian approach to regularization 
does this by adopting a probabilistic framework in which the solution to the inverse 
problem is a probability measure on X, rather than a single element of X. 

We think of (u, yjeXx M. N as a random variable. Our goal is to find the distri- 
bution of u given y, often denoted by u \y. We define the joint distribution of (u , y) as 
follows. We assume that u and £, appearing in (HJ are indepenent mean zero Gaus- 
sian random variables, supported on X and M. N respectively, with covariance opera- 
tor and covariance matrix F respectively. By equation (HJ, the distribution of y 
given m, denotedy|«, is Gaussian N(&(u),r). The measure Hq=N(0, j c t?) is known 
as the prior measure. It is most natural to define the measure /io on a Hilbert space 
f/DI. Under suitable conditions on c 6\ we have Hq(X) = 1. This means that under 
the measure jUtj, u € X almost surely so that 5f (k) is well-defined, almost surely. If 
Ho{X) = 1, it follows that the Hilbert space E with norm || • \\ E = \\ c 6'~ l l 2 ■ \\ H is 
compactly embedded into X. The space E is known as the Cameron-Martin space. 
In the infinite dimensional setting, functions drawn from (Iq are almost surely not in 
the Cameron-Martin space. See l2l[20) for detailed discussion of Gaussian measures 
on infinite dimensional spaces. 

When solving the inverse problem, the aim is to find the posterior measure 
pL y (du) = P(du\y), and to obtain information about likely candidate solutions to 
the inverse problem from it. Informal application of Bayes' theorem gives 



The probability density function for P(y|«) is, using the property of Gaussians, pro- 
portional to 



The infinite dimensional analogue of this result is to show that jit- v is absolutely 
continuous with respect to /Iq with Radon-Nikodym derivative relating posterior to 
prior as follows: 



Here <P(u) is given by (fl~8T > and Z = J x exp(— <P(u))jj.o(du). The meaning of the 
formula ( f26b is that expectations under the posterior measure fx y can be rewritten 



P(«|y) ~ P(y|«)/lo(«). 



(25) 





(26) 
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as weighted expectations with respect to the prior: for a function & on X we may 
write 

/ ^(u)pL y (du) — / ~exp(-^(K))«F(w);Uo(dw). 
Jx Jx Z 

Theorem 7. ( UllV ) Assume that jUoPO = !• Then fj, y is absolutely continuous with 
respect to /Xo with Radon-Nikodym derivative given by j26h Furthermore the mea- 
sure H y is locally Lipschitz in the data y with respect to the Hellinger metric: there 
is a constant C = C(r), such that, for all y, y' with max { \y | , \y' | } < r, 

d HELL (n y ^ y ')<C\y-y'\. (27) 

If /X, V are probability measures that are absolutely continuous with respect to 
the probability measure p, then the Hellinger metric is defined as 




For any function of u which is square integrable with respect to both jU and v it may 
be shown that the difference in expectations of that function, under and under v, 
is bounded above by the Hellinger distance. In particular, this theorem shows that 
the posterior mean and covariance operators corresponding to data sets y and y' are 
^(ly-y'l) apart. 

The choice of prior jiio, relates directly to the regularization of the inverse prob- 
lem. To see this we note that since the operator c € is necessarily positive and self- 
adjoint we may write down the complete orthonormal system 

Y ^<j> m = al<j) m , meZ+ limc7 m = 0. (28) 

A «!-><*> 

Then u ~ [Iq can be written via the Karhunen-Loeve expansion as 

u ( x ) = a mTlm<hn{x) (29) 

meZ+ 

where the rj„, form an i.i.d. sequence of unit Gaussian random variables. We may 
regularize the inverse problem by modifying the decay rate of a m . For example, 
choosing a m = for m ^ ^ , where C Z + has finite cardinality restricts the 
solution of the inverse problem to a finite dimensional set, and is hence a regulariza- 
tion. More generally, the rate of decay of the a m (which are necessarily summable as 
is trace class) will effect the almost sure regularity properties of functions drawn 
from and, by absolute continuity of \i y with respect to of functions drawn 
from jj, y . 

In the case that X is a subset of H = L?(D) with D C W 1 , the operator c € may be 
identified with an integral operator: 

j( c ^(j))(xi) = J c(xi,x 2 )Q(x 2 )dx2 
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for some kernel c(x\,X2)- The regularity of c(xi,X2) determines the decay rate of 
<5m lfl9l . If *«? = {—A) a then the corresponding measure jiQ has the property that 
samples are almost surely in the Sobolev space H s and in the Holder space C s for 
all s < a — j (see 02) for more details). In particular, if a > d/2, then ju(X) = 1 
whenX =L°°(D). 

Priors which charge functions with a multiscale character can be built in this 
Gaussian context. One natural way to do this is to choose ..W. as above so that it 
contains two distinct sets of functions varying on length scales of 6(X) and &{e) 
respectively. A second natural way is to choose a covariance function c = c £ which 
has two scales. 

The formula d26l l shows quite clearly how regularization works in the Bayesian 
context: the main contribution to the expectation will come from places where <P 
is close to its minimum value and where jio is concentrated; thus minimizing <P is 
important, but this minimization is regularized through the properties of the measure 
j^o- We now develop this intuitive concept further by linking the Bayesian approach 
to Tikhonov regularization and the functional / given by ( |23T >. 

Given ze£ and 8 <C 1 define the small ball probability 

J 5 (z)=¥^(\\u-z\\x<S). 

Note that this ball is in X but centred at a point z <G E, with E (the Cameron-Martin 
space) compact in X. It is natural to ask where J s (z) is maximized as a function 
of z and placing z in E allows us to answer this question. Furthermore we then 
see a connection between the Bayesian approach and the Tikhonov approach to 
regularization. The next theorem shows that small balls centred at minimizers of 
d23l ) will have maximal relative probability under the Bayesian posterior measure, 
in the small ball limit t)—>0. 

Theorem 8. (Ifl4\l) Assume that Ho{X) = 1. Then 

In the Bayesian context the solution of the Tikhonov regularized problem is 
known as the Maximum A Posteriori estimator (MAP estimator) ll7l [T7l . 



4 Large Data Limits 

In the previous section we showed how regularization plays a significant role in 
the solution of inverse problems. Choosing the correct regularization is part of the 
overall modelling scenario in which the inverse problem is embedded, as we demon- 
strated in the running example of Section [TT2l In some situations it may be suitable 
to look for the solution of the inverse problem over a small finite set of parameters, 
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whilst in others it may be desirable to look over a larger, even infinite dimensional 
set, in which oscillations are captured. 

This section is devoted entirely to inverse problems where a single scalar parame- 
ter is sought and we study whether or not this parameter is correctly identified when 
a large amount of noisy data is available. The development is tied specifically to the 
running example, namely the PDE (O. For a fixed permeability coefficient generat- 
ing the data, Fitzpatrick has also studied the consistency and asymptotic normality 
of maximum likelihood estimates in the large data limit ifTTl . Related work on pa- 
rameter estimation in the context stochastic differential equations (SDEs) may be 
found inl25ll23l. 



4.1 The Statistical Model 

We consider the problem of estimating a single scalar parameter u € K in the elliptic 
PDE 

V-v = /, xED, 

p = 0, xe dD, (30) 
v = — exp(u)AVp 

where D C M. d is bounded and open, and / G H~ l as well as the constant sym- 
metric matrix A are assumed to be known. We let G : ffi — >• Hq (D) be defined by 
G(u) = p. Then using the same linear functionals as in the running example from 
Section 11.21 we may construct the observation operator : R — > M. N defined by 
@(u)j = £j(G(u)). Our aim is to solve the inverse problem of determining u given 
y satisfying (Q~|). For simplicity we assume that % ~ A^(0, yl) which implies that 
the observational noise on each linear functional is i.i.d. Af(0,y 2 ). Since u is finite 
dimensional we will simply minimize <P given by (fl~8T >: no further regularization is 
needed because u is already finite dimensional. 

Notice that the solution p of (f30b is linear in exp(— u) and that we may write 
G(u) = exp(—u)p* where p* solves 

V-v = /, x€D, 
p*=0, x G dD. (31) 
v = -AV// 

Note that @(u)j = exp(—u)£j(p*) so that the least squares functional (fT8l i has 
the form 

*(») = sE \yj-*j( u )\ 2 = tL I |y,-exp(-»)^(p*)| 2 - 

It is straightforward to see that <P has a unique minimizer u satisfying 
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E?=i4(p*) 2 



exp(-w) = ^ - - — . (32) 



It is now natural to ask whether, for large N, the estimate u is close to the desired 
value of the parameter. We study two situations: the first where the data is generated 
by the model which is used to fit the data; and the second where the data is generated 
by a multiscale model whose homogenized limit gives the model which is used to 
fit the data. 



4.2 Data From the Homogenized Model 

We define po = exp(— wo)/?* so that p$ solves (l30b with u = Mo- 
Assumption 9 We assume that the data y is generated from noisy observations gen- 
erated by the statistical model: 

yj = £j(po) + Zj 

where form an Ltd. sequence of random variables distributed as N(0, y 2 ). 

Theorem 10. Let Assumptions\9\hold and assume that liminf^oo 1 £* =1 tj (p*) 2 > 
L > as N — > °°. Then ^-almost surely 

lim |exp(— u) — exp(— uq) \ — 0. 

Proof. Substituting the assumed expression for the data from Assumption|9]into the 
formula d32l gives 

exp(-w) = exp(-«o) +h 



where 



Therefore, 



t < 

l!J =l lj(p*) 2 ~ NL 



ttf] = =R T7ZX> ^ ^ (33) 



for N sufficiently large. Since l\ is Gaussian we deduce that E/J" p = &{N~ P ) as 
N — > oo. Application of the Borel-Cantelli lemma shows that I\ converges almost 
surely to zero asJV->=o. □ 

This shows that, in the large data limit, random observational error may be av- 
eraged out and the true value of the parameter recovered, in the idealized scenario 
where the data is taken from the statistical model used to identify the parameter. The 
condition that L > prevents additional observation noise from overwhelming the 
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information obtained from additional measurements as N — > °°. It is a simple explicit 
example of what is known as posterior consistency [8] in the theory of statistics. 



4.3 Data From the Multiscale Model 

In practice, of course, real data does not come from the statistical model used to 
estimate parameters. In order to probe the effect that this can have on posterior con- 
sistency we study the situation where the data is taken from a multiscale model 
whose homogenized limit falls within the class used in the statistical model to esti- 
mate parameters. Again we define po = exp(— uo)p* and we now define p e to solve 
(0 with K £ chosen so that the homogenized coefficient associated with this family 
is Kq = exp(i<o)A. 

Assumption 11 We assume that the data y is generated from noisy observations of 
a multiscale model: 

with p £ as above and the {^j} an i.i.d. sequence of random variables distributed as 
N(0,Y 2 )- 

Theorem 12. Let Asswnptions\TT\hold and assume that that the linear functionals 
lj are chosen so that 




and liminf^^oo jjT!J=i^j{p*) 2 > L > as N — > °°. Then t,— almost surely 
lim lim |exp(— u) — exp(— uq)\ =0. 

£->0A'-!-~ 

Proof. Notice that the solution of the homogenized equation is po = exp(— uo)p*. 
We write 

yj = tj{po)+£j(p e -Po) + Zj 

= exp(-u )£j(p*) + £j{ P £ - po) + %j. 
Substituting this into the formula (|32] i gives 

exp(-w) = exp(-Mo) +/i +/| 
where l\ is as defined in the proof of Theorem[lO]and is independent of e, and 

E _ I.Utj(p £ -Po)ej(p*) 
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The Cauchy-Schwarz inequality gives 



1/2 - \NL f 



(e7=i^(p*) : 



2 XJ " ;=i 



for Af sufficiently large. As in the proof of TheoremfTOlwe have, ^-almost surely, 

lim |exp(— u) — exp(— Un) =0. 

Af-yo 

From this and d34l i the desired result now follows. □ 

The assumption ( f34b encodes the idea that, for small e, the linear functionals 
used in the observation process return nearby values when applied to the solution 
p £ of the multiscale model or to the solution po of the homogenized equation. In 
particular, CorollaryQTJimplies that if {ij(p)}J =i is a family of bounded linear func- 
tionals on L 2 (D), uniformly bounded in j, then (|34| | will hold. On the other hand, 
we may choose linear functionals that are bounded as functionals on H\D) yet un- 
bounded on L 2 (D). In this case Theorem Q] shows that ( [34-b may not hold and the 
correct homogenized coefficient may not be recovered, even in the large data limit. 
An analogous phenomenon occurs in inference for SDEs where if the observations 
of a multiscale diffusion are too frequent (relative to the fast scale) then the correct 
homogenized coefficients are not recovered 



5 Exploiting Multiscale Properties Within Inverse Estimation 

In this section we describe how ideas from homogenization theory can be used to 
improve the estimation of parameters in homogenized models. We consider a regime 
where the unknown parameter has small-scale fluctuations that may be character- 
ized as random. In this case, if we attempt to recover the homogenized parameter 
the error % appearing in (Q~|i is affected by the model mismatch. This is because 
the simplified, low-dimensional parameter used to fit the data is different from the 
true unknown coefficient. So, even when there is no observational noise, the error 
% has a statistical structure. Nevertheless, homogenization theory predicts that this 
discrepancy between G(u) and y associated with model mismatch will have a uni- 
versal statistical structure which can be exploited in the inverse problem, as we now 
describe. 

The specific ideas described here were developed by Nolen and Papanicolaou in 
||2TI for one dimensional elliptic problems, including the groundwater flow prob- 
lem that we study here. Bal and Ren B have employed similar ideas in the study 
of Sturm-Liouville problems with unknown potential. We begin by describing in 
Section [57X1 the homogenization and fluctuation theory for the case that the (scalar) 
permeability k(x) is random. Then, in Section 15721 we show how these ideas can be 
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used to develop an improved estimator for the homogenized permeability coeffi- 
cient. We conclude with numerical results in Section |531 



5.7 The Model 

In this section we will present the approach of ||2T1 in the simplest possible setting. 
We consider the two-point boundary value problem 

' ^xp(u(x))^-) =f(x), xei-1,1], (35a) 



dx \ dx 

p(-l)=p(l) = 0. (35b) 

This is, of course, (f3]) in the one-dimensional setting d = 1 . 

It is assumed that the coefficient k(x) = exp(w(x)) is a single realization of a 
stationary, ergodic and mixing random field k(x, co). Furthermore it is assumed that 
k~ 1 can be decomposed into a slowly varying non-random component, together with 
a random, rapidly oscillating component: 

<Jn(-,co), (36) 



k(x,co) ko(x) \e 

where jj,(x, co) is a stationary, mean zero random field with covariance 

R(x)=EQi{x + y)n(y)). 

We assume that/?(0) = 1 and [ R R(x)dx = 1. Thus, a 2 and e are the (given) variance 
and correlation length of the fluctuations. We are interested in the case where £< 1 
so that the random fluctuations are rapid. 

The solution p ~ p £ (x, co) of (l35l ) depends on e > and on the realization of 
k(x, (o). However, in the limit as e — » 0, p £ coverges to po(x) which is the solution 
of the homogenized Dirichlet problem 

1 'k (x)^-p ) =f(x) 7 ie[-l,l], (37a) 



dx \ dx 

po(-l)=po(l)=0. (37b) 

Observe that the homogenized coefficient is the harmonic mean of k: k$(x) = 
E{k~ l ]~ l . Moreover, in the limit as e — > 0, the solution p £ has Gaussian fluctua- 
tions about its asymptotic limit [3 |. Specifically, one can prove that 

Pe M-p (x) ^ a j^ y . M)vQiy . M)dWy{a) (38) 
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in distribution as e — > 0, where W y (co) is a Brownian random field, which is a Gaus- 
sian process. Here vo(x;£o) = ko(x)po(x), and the kernel Q(x,y;ko) is then related 
to the Green's function for the one dimensional system: 

p x \_(0 l/k (x)\ fp\ f gi 
vj \0 )\v) \g 2 

If the 2 x 2 Green's matrix for this system is G(x,y;ko) :DxD4M 2 ® R 2 , then 
Q(x,y;ko) = Gi,\(x,y;ko). The important point here is that the integral 

I(x,co) = a Q{x,y;k )v (y;k )dW Y {co) 

JD 



which appears on the right side of ( l38l l is a centered Gaussian random variable with 
covariance 

E[/(jc)/(z)] = a 2 / Q(x,yM)vo(yM) 2 Q(y,vM)dy. 

JD 

This covariance depends on ko. The asymptotic theory given by the limit theorem 
( 1381 ) gives us a good approximation of the statistics of p £ (x, co) even when there is no 
observation noise, and shows that the fluctuations depend on ko. In this simple case 
presented here, Q can be computed explicitly. In other cases, it can be computed 
numerically; see ||2T1 for more details. 



5.2 Enhanced Estimation 



We now show how this asymptotic theory can be used to enhance estimation of 
the homogenized parameter ko(x). The inverse problem is to identify the parameter 
ko (x) in the model 

"^r W ^ ) =/W ' (39a) 
Po(-l)=/»o(l)=0. (39b) 

We take the viewpoint that the data actually come from observations of p £ (x,co), 
which is the solution of the multiscale model d35l ) with k(x, CO) given by (136b . so 
there is a discrepancy between the model used to fit the data and the true model 
which generates the data. Now the outstanding modelling issue is the choice of 
statistical model for the error £, in (HJ. 

Suppose we make noisy observations of Pe{xj) at points {xj} 1 J =l distributed 
throughout the domain. Then the measurements are 



yj = Pe(xj,<B) + Zj, j=l,...,N 
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where £y ~ N(0, y 2 ) are mutually independent, representing observation noise. The 
limit d38l l we have just described tells us that for e small, these measurements are 
approximated well by 

yj~Po(xj)+%'j, 

where {^jj-^Lj are Gaussian random variables with mean zero and covariance 

C j4 (k ,e) = E[§j#] = fS jX + ea 2 J D Q(xj,y;k )v (y;ko) 2 Q(x e ,z;k )dy (40) 
Therefore, we model the observations as 

where £f (&o) = po(xj\ko) with po being the solution of (|39l) . The modified statistical 
error has two components. The first term y 2 <S,^ is due to observation error. The 
second term comes from the asymptotic theory and is associated with the random 
microstructure in the true parameter k(x, <X>). Of course, if e is very small, relative 
to y 2 , then the observation noise dominates (l4Qb . In this case, the observations of p £ 
may be very close to observations of the homogenized solution po, and we might 
simply assume that %' ~ N^fl), ignoring the error associated with the model 
mismatch. On the other hand, if y 2 is small relative to e then the statistical error 
if;' is dominated by the model mismatch. In this case, homogenization theory gives 
us an asymptotic approximation of the true covariance structure of which is 
quite different from N(0,y 2 I). See lETl for a discussion of some properties of the 
covariance matrix C(ko,e). 

Using the covariance d40l ). we make the approximation 

ny\ko)*-, * ^(-Uy-^{h)) T c{h-e)-\y-^{k ))), 

^2n\C(k ;e)\ \ ^ ' 

where | • | denotes the determinant. The parameter ko(x) is a function, in general, 
and we may place a Gaussian prior jio on uq(x) = \ogko(x). Application of Bayes' 
theorem ( f25T > (with replacing u) gives that 

P(*db) - -=J==exp(-i(>.-^ )) r C(^o;e)- 1 (3'-^o)))Aio(logA:o) 

where the constant of proportionality is independent of k§. The maximum a poste- 
riori estimator (MAP) is then found as the function ka(x) which maximizes P(^o|y) 
which is the same as minimizing I(ko) = — \n(¥(ko\y)) . The key contribution of ho- 
mogenization theory is to correctly identify the noise structure which has covariance 
C(ko; e) depending on ko(x), the parameter to be estimated. 
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5.3 Numerical Results 



In this section we demonstrate the results of a numerical computation that show 
some advantage to using the homogenization theory as we have just described. 
Given noisy observations of pe(xj) we may compute the MAP estimator k\ using 
d4lT) with covariance C(ko; e) given by d40l >: 



h =argmax, \ exp(-i(y-^o)) r C(fr ; £ )-' (y-#(k )) )^(logk ), 

(41) 

On the other hand, we might ignore the effect of the random microstructure and 
simply use C = J I, accounting only for observation noise: 

h = argmax^ 1 exp(- j-y- 2 |y-^(fr )| 2 Wlogfro). (42) 

y / 27Z\Y 2 I\ v 2 7 

Both estimates k\ and £2 are random variables, depending on the random data ob- 
served, but we should hope that k\ gives us a better approximation of ko, since it 
makes use of the true covariance ( f40b . Indeed for simple linear statistical models, 
it is easy to see that an efficient estimator, which realizes the theoretically optimal 
variance given by the Cramer-Rao lower bound, may be obtained by using the true 
covariance of the data; however, using the incorrect covariance may lead to an es- 
timate with significantly higher variance than the theoretical optimum. See lED for 
more discussion of this point. The present setting is highly nonlinear and the vari- 
ance of the estimates k\ and £2 cannot be computed explicitly, since C(ko,e) depends 
on ko in a nonlinear way through solution of the PDE. Nevertheless the numerical 
results are consistent with the expectation that approximation of the true covariance 
(through homogenization theory) yields a MAP estimator that has smaller variance, 
relative to the estimate that makes no use of the homogenization theory (see Figure 

In Figure Q] we show one realization of the true coefficient k(x, 03) which was 
used to generate the data. The highly-oscillatory graph represents the true coeffi- 
cient k(x,co) with variation on many scales. The slowly-varying harmonic mean 
ko(x) also is displayed here as the thick curve; this function ko is what we attempt 
to estimate. The data was generated as follows. Using one realization of k(x, ft)) 
and given forcing /, we solve the Dirichlet boundary value problem ( f33T >. The ob- 
servation data involves point-wise evaluation of p e (xj) at points {xj}j =1 spaced 
uniformly across the domain, plus independent observation noise N(0, y 2 ) at each 
point of observation. Using this data, we compute estimates k\ and k\ by minimizing 
(|4TT > and d42l . respectively. For the computation shown here, the function ko(x) is 
parameterized by the first three coefficients in a Fourier series expansion. So, com- 
puting k\ and £2 involves an optimization in K 3 . To evaluate P(^o|y) at each step 
in the minimization algorithm, we must solve the forward problem d39t with the 
current estimate of ko, and in the case of £1 we must also compute C(ko,e). See 12T1 
for more details about this computation. 
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Fig. 1 The thin erratic curve is one realization of the true coefficient k £ (x, a). The thick curve is 
the slowly- varying harmonic mean k(, (x) . This realization was used to generate the data. 

Figure[2]compares the estimate k\{x) with the true function Icq(x). Since the esti- 
mate k\ (x) is a random function, we performed the experiment many times (gener- 
ating new k(x, ft)) to compute each estimate k\) and display the results of 100 exper- 
iments. The data for k.2 is qualitatively similar. Nevertheless, the pointwise variance 
Var[fei(x)] is smaller than Var[k2(x)], as shown in Figure[3] This is consistent with 
the linear estimation theory for which knowledge of the true data covariance yields 
an estimate with optimal variance. 

Acknowledgements The authors thank A. Cliffe and Ch. Schwab for helpful discussions con- 
cerning the groundwater flow model. 



Appendix 1 



In this Appendix we prove Theorem[3] which, recall, applies in the case where ^}p) 
and ([T3b) are replaced by periodic conditions onD= (LT) d . 
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Fig. 2 The thick curve is the true ko. The dashed series represent 100 independent realizations of 
the estimate k\ . 



Theorem 13. Let x e (t) and xo(t) be the solutions to equations ^ and d!71 >, with 
velocity fields extended from D = (LT) d to W 1 by periodicity, and assume that As- 
sumption\2\holds. Assume also that f € C°°(D) and that K(x,y) € C°°(D;C^ T (T d )). 
Then 

limE sup \\x £ {t)-x Q (t)\\ =0. 
0<r<r 

Proof. To simplify the notation we will set the porosity of the rock to be equal to 
1, <p = 1. Recall that v £ (x) = K £ (x)Vp £ (x). Our first observation is that, for pf(x) 
given by (O, 

K £ (x)Vp £ (x) =K e {x)Vpl (jc) - 8 £ {x) (43) 

where 

8 £ {x) = -K £ {x)v[p £ {x)~pl{x)). (44) 
From Assumption|2]we deduce that 

lim||5 e (x)]| i -=0. 

E-5-0 

From the definition of pf (jc) it follows that 
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Fig. 3 The upper series (o) is the empirical variance Var[fe(x)]. The lower series (-) is Var[k\ (x)]. 
Both quantities were computed using 500 samples. 

K £ (x) Vpl (x) = Q £ (x) V P0 (x) - eSf (x) 

where 

5f(x) = -K e (x)V xPl (x,x/e), Q £ (x) = Q{x,x/e). (45) 
From the definition of pi in (fl4b we see that 

||5fWlk-<C. 
Putting (l43l l and d45b together we see that 

v £ {x) = -Q e (x)V P0 (x) + 8 e (x) + eSf(x) 

and we see from (l44l > and (05]) that the perturbations of v E (x) from Q £ (x)'Vpo(x) 
are small; it is thus natural to expect a limit theorem for x £ solving <[8J which is 
Lagrangian transport in an appropriately averaged version of Q £ (jc)V po(x) . Further- 
more, since Q{x,y) is divergence free in the fast y coordinate, by ( fTBI ). it is natural to 
expect that the appropriate average is Lebesgue measure. We now demonstrate that 
this is indeed the case. 
From ([8]) we deduce that 
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x £ (f)=x(0)+ [' (-Q £ (x)Vp Q (x(s)) + 8 £ (x( s )) + e8f( x (s)))ds + ^o~iw(t). 
Jo 

(46) 

Define now V(x,y) = — Q(x,y)"Vpo(x) and consider the system of SDEs 

- = (V(x,y) + 8 £ (x) + e8f(x)) + (47a) 



l4(v(„, +5 « W ) +5fw+ ^f. 

Since y = x/e we see that x(t), the solution of ( f4Tb is equal to x £ (t) appearing in 
The process {x(t), y(t)} is Markov with generator 

J2f = ^((V(jc,y) + 5 e (x))-V, + 77o4y 

V(jc,y) + <S £ (x)) • V A - + 5f (x) • V v + 770 V,. • V v + T] V, • V, 
+er] A x + e8f(x)-V x 
=: i (j2f + 5 £ (x) ■ V y ) + JSfj + eJ? 2 . 

Consider now the Poisson equation 

-J2&* = V(*oO-v (jc) (48) 

with (see (OfcV) 

M x ) = / ,y( x ,y)dy- 

Jr d 

Equation ( |48l is posed on with periodic boundary conditions. Notice that x enters 
merely as a parameter in this equation. The operator Jz?o is uniformly elliptic on T d 
and the right hand side averages to 0, hence, by Fredholm's alternative this equation 
has a solution which is unique, up to constants. We fix this constant by requiring 
that Jjd <P(x,y)dy = 0. We define <J> £ (x) := <P(x,x/e) and similarly for 3fj<P e (x). 
Applying Ito's formula to <£> and evaluating at y = x/e we obtain 

d **(x) = I(^*» + «« W .V,*(,,,/e))* + ^& + i^*»A 

+\j^-V y <P £ dW + y/2r}olV x e dW 

= - - (V (x,x/e) - v (x) + c5 £ (x) • V,*(x,x/e)) * + ^i<£ £ + e^ 2 <£ £ A 

^ V y <P e dW + v / 2rjo~eV x <I> £ dW. 

Consequently, 
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t i-t 
V(x(s),y(s))ds- / v Q (x(s))ds 
Jo 

= £ (s e {x(s)) ■ V y cP(x(s)Ms)/£)+£^i^ £ (x(s))+E 2 ^ 2 cP £ (x(s))^ ds 
-e(0 £ (x £ (t))-0 £ {x £ {O))^j+V^M E {t), 

where 

M £ (t ) : = J ( V2J?o v v ^ £ + e V^o V. v <P £ ^)dW. 
Since y) is periodic in both coordinates we have that 

||V,*(jt,x/e)lk-<C, \\<P £ (x)\\ L ~<C, HJSfi^llL-^C, \\J?i<P £ \\l-<C 
and 

E||M £ (f)j| p <C, p>l. (49) 
We combine the above calculations to obtain 

rt 



x £ (t)=x(0)+ / v {x £ (s))ds + H £ (t) + VeM £ (t), 
Jo 

where 

H £ (t) := - £ (V(x £ (f))-<l> £ (x £ (0))) + j\8 £ (x £ {s)) + eSf{x £ (s)))ds 

V (x(s)) ■ V y 0(x(s),x{s) /e) + e&i <P £ (x(s)) + £ 2 Jz? 2 <£ £ (*(*)) ) ds 



and 

M £ (t) =M £ (t) + ^/2rj W(t). 

Our estimates imply that 

limE sup \H £ (t)\ = 0. 
te[o.T] 

Furthermore, estimate d49b . together with the Burkholder-Davis-Gundy inequality 
imply that 

E sup \M £ (t) \ <C. 
te[0,T] 



On the other hand, 



x(t) =x(0)+ [ v (x(s))ds. 
Jo 



Set 9(T) : = Esup, e [ 07 -] |.Y £ (f) — x(t)\. Because vo is periodic it is in fact globally 
Lipschitz so that we obtain 



T 



9{T)<C B(t)dt + h £ (T), 
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where 

lim/z £ (r) =0. 

£-5-0 

We use Gronwall's inequality to deduce 

0(T) <h e (l+CTe CT ), 
from which the claim follows. □ 



Appendix 2 

In this appendix we study the homogenization problem (O in one dimension. In this 
case we can calculate the homogenized coefficient explicitly and to prove Assump- 
tion|2] More details can be found in l24l Ch. 12]. 



The Homogenized Equations 

We take d = 1 in (|7]i and set D = [0,L]. Then the Dirichlet problem ^} reduces to a 
two-point boundary value problem: 

-S(« p ("(*'f))¥) =/ «* X e(0,L), W 
p e (0)=p e (L) = 0. (50b) 

We assume that u(x,y) is smooth in both of its arguments and periodic in y with 
period 1. Furthermore, we assume that this function is bounded from above and 
below. Consequently, there exist constants < a < j3 < °° such that 

a<exp( M (x,y))<j3, Vye[0,l]. (51) 

We also assume that / is smooth. 

The cell problem becomes a boundary value problem for an ordinary differen- 
tial equation with periodic boundary conditions. Introducing the notation k(x,y) := 
exp(u(x,y)), the cell problem can be written as 

X is 1 -periodic, f %{ x ,y)dy = 0. (52b) 
Jo 

Notice that the macrovariable x enters the cell problem d52l as a parameter. Since 
d = 1 we only have one effective coefficient which is given by the one dimensional 
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version of (fTTI).(fT2li. namely 

ko(x) = [k{x,y)+k(x,y)^{x,y) S j dy 

k(x,y)(l + ^(x,y)\\ (53) 

where we have introduced the notation (<j)(x,y)) := J (j)(x,y)dy. The homogenized 
equation is then 

- ~T (^(x)^) =f, xe (0,L), (54a) 



dx \ dx 

p(0)=p(L) = 0. (54b) 



Explicit Solution of the Cell Problem 



Equation ( 15 2 at can be solved exactly. After integrating the equation and applying 
the periodic boundary conditions, we obtain 



n l 

X{x,y) = -y + ci / — rdy + c 2 , 

Jo k{x,y) 



with 



ci{x) = fl i - =(k(x,y)- 1 )- 1 . 



Therefore, from (153I I we obtain: 

k (x) = (k{x,y)- 1 }- 1 - (55) 

The constant c 2 is irrelevant. This is the formula which gives the homogenized coef- 
ficient in one dimension. It shows clearly that, even in this simple one-dimensional 
setting, the homogenized coefficient is not found by simply averaging the unhomog- 
enized coefficients over a period of the microstructure. Rather, the homogenized co- 
efficient is the harmonic average of the unhomogenized coefficient. It is quite easy 
to show that ko(x) is bounded from above by the average of k(x,y). Notice that the 
homogenized coefficient can be written in the form 



ko(x)=e*>to } where «o(jc) = log ((exp(-u(x,y)))- 1 ) . (56) 
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The fact that we can obtain an explicit formula for the solution of the boundary 
value problem ( l50b as well as for the solution of the cell problem 02] ) enables us to 
prove Assumption^ 

Proposition 14 Let p £ (x) be the solution of the two-point boundary value prob- 
lem (1501 ) where the log permeability u(x,y) is smooth in both of its arguments and 
satisfies ( 15 It . Let k(x,y) = exp(u(x,y)) and define 

x\ dp £ 



and 

dX , s\dpo, 
^— x,v — — [X 



V(x,y)=k(x,y) \l + -±(x,y)j-gi( x ), 
where po(x) is the solution of the homogenized equation ( 154b . Then 

]im\\v e (x)-V(x,x/e)\\ L ~ =0. (57) 

£-5-0 

Notice that, by (TBI ), the corrector pi{x,y) — x{x,y)^-{x). Hence, using the 
bound ( |5"T| ) from below on a, together with the definition (0 of pf, this theorem 
delivers the following immediate corollary: 

Corollary 2. Under the assumptions of Proposition\l4\we have 

lim||p e -pf|| w i,c=0. 

£^0 

Proof of PropositionUH We have that 

^&y) = -l-' ko(x) 



dy ' k(x,y) ' 

Consequently 

V(x,y)=k (x)^(x). 

dx 

Define a function F by F'(z) = f(z). We solve the homogenized equation to obtain 

k (x) £ ^(x) = -F(x)+c, 

with 

c _ Jok 1 (z)F(z)dz 
J Q L k l (z)dz 

Similarly, from (l50l we obtain 
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with 

c£ ^ Jo L k-\z,z/E)F(z)dz 
f L k-i(z,z/e)dz 
From the above calculations we deduce that 

\\v £ (x)-V(x, X /e)\\ L ~ = \c-c% 

It suffices to show that \c — c £ \ = ff{e). This will follow from the fact that 

f L k-\z,z/e)G(z) = [ L k Q l (z)G(z)dz+m 
Jo Jo 

for any smooth function G, as e — >• 0. To see this, define integer N and 8 E [0, e) 
uniquely by the identity 

L = Ne + S. (58) 

Then note that, using the uniform bounds on k(x,y) from below, together with uni- 
form (iny) Lipschitz properties of a(-,y) and G, we have forz„ = ne, 

N-l r (n+l)s 



n=0- 
N-l r (n+l)e 



[ k-\z,z/e)G(z)dz= T f 1 k- l ( Zn ,z/e)G(z n )dz + 0(e) 

JO „_n Jne 

k Q \zn)G{z n )dz + ff{e) 
£ / k Q \z)G{z)dz + m 
' k o l (z)G(z)dz+0(e). 



N-l r (n+l)e 



This completes the proof. □ 
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